Large-scale Cloze Test Dataset Designed by Teachers

نویسندگان

Qizhe Xie

Guokun Lai

Zihang Dai

Eduard H. Hovy

چکیده

Cloze test is widely adopted in language exams to evaluate students’ language proficiency. In this paper, we propose the first large-scale human-designed cloze test dataset CLOTH 1, in which the questions were used in middle-school and high-school language exams. With the missing blanks carefully created by teachers and candidate choices purposely designed to be confusing, CLOTH requires a deeper language understanding and a wider attention span than previous automatically generated cloze datasets. We show humans outperform dedicated designed baseline models by a significant margin, even when the model is trained on sufficiently large external data. We investigate the source of the performance gap, trace model deficiencies to some distinct properties of CLOTH, and identify the limited ability of comprehending a long-term context to be the key bottleneck. In addition, we find that human-designed data leads to a larger gap between the model’s performance and human performance when compared to automatically generated data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dataset for the First Evaluation on Chinese Machine Reading Comprehension

Machine Reading Comprehension (MRC) has become enormously popular recently and has attracted a lot of attentions. However, existing reading comprehension datasets are mostly in English. To add diversity in reading comprehension datasets, in this paper we propose a new Chinese reading comprehension dataset for accelerating related research in the community. The proposed dataset contains two diff...

متن کامل

Who did What: A Large-Scale Person-Centered Cloze Dataset

We have constructed a new “Who-did-What” dataset of over 200,000 fill-in-the-gap (cloze) multiple choice reading comprehension problems constructed from the LDC English Gigaword newswire corpus. The WDW dataset has a variety of novel features. First, in contrast with the CNN and Daily Mail datasets (Hermann et al., 2015) we avoid using article summaries for question formation. Instead, each pro...

متن کامل

Quasar: Datasets for Question Answering by Search and Reading

We present two new large-scale datasets aimed at evaluating systems designed to comprehend a natural language query and extract its answer from a large corpus of text. The QUASAR-S dataset consists of 37000 cloze-style (fill-in-the-gap) queries constructed from definitions of software entity tags on the popular website Stack Overflow. The posts and comments on the website serve as the backgroun...

متن کامل

A Selection Strategy to Improve Cloze Question Quality

We present a strategy to improve the quality of automatically generated cloze and open cloze questions which are used by the REAP tutoring system for assessment in the ill-defined domain of English as a Second Language vocabulary learning. Cloze and open cloze questions are fill-in-the-blank questions with and without multiple choice, respectively. The REAP intelligent tutoring system [1] uses ...

متن کامل

Improving Cloze Test Performance of Language Learners Using Web N-Grams

We study the effectiveness of search engines for common usage, a new category of search engines that exploit n-gram frequencies on the web to measure the commonness of a formulation, and that allow their users to submit wildcard queries about formulation uncertainties often encountered in the process of writing. These search engines help to resolve questions on common prepositions following ver...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1711.03225 شماره

صفحات -

تاریخ انتشار 2017

Large-scale Cloze Test Dataset Designed by Teachers

نویسندگان

چکیده

منابع مشابه

Dataset for the First Evaluation on Chinese Machine Reading Comprehension

Who did What: A Large-Scale Person-Centered Cloze Dataset

Quasar: Datasets for Question Answering by Search and Reading

A Selection Strategy to Improve Cloze Question Quality

Improving Cloze Test Performance of Language Learners Using Web N-Grams

عنوان ژورنال:

اشتراک گذاری